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Abstract 

We  show  how  to  turn  a  regular  expression  into  an  0{s)  space  representation  of  McNaughton  and 
Yamada's  NFA,  where  s  is  the  number  of  NFA  states.  The  standard  adjacency  list  representation  of 
McNaughton  and  Yamada's  NFA  takes  up  s  +  s^  space  in  the  worst  case.  The  adjacency  list  representation 
of  the  Nl'.\  produced  by  Thompson  takes  up  between  2r  and  5r  space,  where  r  >  s  in  general,  and  can 
be  arbitrarily  larger  than  s.  Given  any  set  T  of  NFA  states,  our  representation  can  be  used  to  compute 
the  set  A'  of  states  one  transition  away  from  the  states  in  T  in  optimal  time  0(|T|  +  |A^|).  McNaughton 
and  Yamada's  NFA  requires  0{|T|  x  |A^|)  in  the  worst  case.  Using  Thompson's  NFA,  the  equivalent 
calculation  requires  0(r)  time  in  the  worst  case. 

An  implementation  of  our  NFA  representation  confirms  that  it  takes  up  an  order  of  magnitude  less 
space  than  McNaughton  and  Yamada's  machine.  An  implementation  to  produce  a  DFA  from  our  NFA 
representation  by  subset  construction  shows  linear  and  quadratic  speedups  over  subset  construction 
starting  from  both  Thompson's  and  McNaughton  and  Yamada's  NFA's.  It  also  shows  that  the  DFA 
produced  from  our  NFA  is  as  much  as  one  order  of  magnitude  smaller  than  DFA's  constructed  from  the 
two  other  NFA's. 

1      Introduction 

The  growing  importance  of  regular  languages  and  their  associated  computational  problems  in  languages 
and  compilers  is  underscored  by  the  granting  of  the  Turing  Award  to  Rabin  and  Scott  in  1976,  in  part,  for 
their  ground  breaking  logical  and  algorithmic  work  in  regular  languages  [16].    Of  special  significance  was 

•This  research  was  partially  supported  by  Office  of  Naval  Ressearch  Grant  No.    N00014-90-J-1890  and  .Air  Force  Office  of 
Scientific  Research  Grant  No.  AFOSR-91-0308. 


their  construction  of  the  canonical  minimum  stat(>  DFA  that  had  been  described  nonconstructively  in  the 
proof  of  the  Myhill-Nerode  Theorem[14.15].  Raliiii  antl  Scott's  work,  which  was  motivated  by  theoretical 
considerations,  has  gained  in  importance  as  the  uiiiiiIxm-  of  practical  applications  has  grown.  In  particular, 
the  construction  of  finite  automata  from  regular  iwprc^sions  is  of  central  importance  to  the  compilation 
of  communicaiiug  processes[4],  string  pattern  matclini^[:i].  model  checking[8],  lexical  scanning[2],  and  VLSI 
layout  design[20];  unit  time  incremental  acceptance  testing  in  a  DFA  is  also  a  crucial  step  in  LRk  parsing[12]; 
algorithms  for  .acceptance  testing  and  DFA  construction  from  regular  expressions  are  implemented  in  the 
UNIX  operating  system[17]. 

Throughout  this  paper  our  model  of  computat ion  is  a  uniform  cost  sequential  RAM  [1].   We  report  the 
following  four  results. 

1.  Recently  Berry  and  Sethi[5]  used  results  of  Bizozowski[6]  to  formally  derive  and  improve  McNaughton 
and  Yamada's  algorithm[i;i]  for  turning  regular  expressions  into  NFA's.  NFA's  produced  by  this 
algorithm  have  fewer  states  than  NFA's  produced  by  Thomp.son's  algorithm[lS].  and  in  pra-tice  they 
are  known  to  outperform  Thompson's  NFA's  for  acceptance  testing.  Berry  and  Sethi's  algorithin  has 
two  pa.s.-^<-s  and  can  easily  be  implemented  to  run  in  time  6(;7i,)  and  auxiliary  space  0(/-),  vvhere  r  is 
the  length  of  the  regular  expression,  and  m  is  the  number  of  edges  in  the  NFA  produced.  We  present 
an  algorithm  that  computes  the  same  NFA  in  a  .single  left-to-right  scan  over  the  regular  expression. 
It  runs  ill  the  same  asymptotic  time  e(/;0  as  Berry  and  Sethi,  but  it  improves  the  auxiliary  space  to 
0(s),  where  s  is  the  number  of  occurrences  ofali.liabet  symbols  appearing  in  the  regular  expression. 

2.  One  dis.ulvantage  of  McNaughton  and  Yamada's  NFA  is  that  its  worst  case  number  of  edges  is 
;))  =  &(>-),  which  is  also  a  worst  case  space  bound  for  the  standard  adjacency  list  implementation. 
Thompson's  NFA  only  has  between  r  and  .'r  states  and  between  r  and  3r  edges.  We  introduce  a  new 
compressed  representation  for  McNaughton  and  \'ainada's  NFA  that  uses  only  e(s)  space.  Our  com- 
pressed NFA  can  be  constructed  from  a  regular  expression  R  in  Q{r)  time  and  0(s)  auxiliary  space. 
It  supports  acceptance  testing  in  worst-case  time  0{s\x\)  for  arbitrary  string  x.  and  a  promising  new 
way  to  construct  DFA's  faster  than  the  classical  subset  construction  of  Rabin  and  Scott. 

3.  Our  main  theoretical  result  is  a  proof  that  \Uo  compressed  NFA  can  be  used  to  compute  the  set  of 
states  JV  one  edge  away  from  an  arbitrary  set  .if  states  T  in  McNaughton  and  Yamada's  NFA  in  optimal 
time  0(\  /'I  +  |yV|).  The  previous  best  worst-case  time  is  6(|T|  x  \N\). 

4.  We  give  empirical  evidence  that  our  algorithm  for  XFA  acceptance  testing  using  the  compressed  NFA 
yields  a  constant  factor  speedup  over  acceptance  testing  using  Thompson's  NFA,  and  is  comparable  to 
McNaughton  and  Yamada's  NFA.  We  give  more  dramatic  empirical  evidence  that  constructing  a  DFA 
from  our  compressed  NFA  can  be  achieved  in  time  one  order  of  magnitude  faster  than  the  classical 
Rabin  and  Scott  subset  construction  (cf.    Chapter  3  of  [2])  starting  from  either  Thompson's  NFA  or 


McNaughton  and  Yamada's  NFA.  Our  benchmarks  also  show  subset  construction  being  faster  when  it 
starts  from  Thompson's  machine  than  from  McNaughton  and  Yamada's  NFA. 

The  next  two  sections  present  standard  terminology  and  background  material,  and  can  be  skipped  by 
anyone  who  knows  Chapter  3  of  [2].  Section  4  reformulates  McNaughton  and  Yamada's  algorithm  from  an 
automata  theoretic  point  of  view.  Section  5  describes  our  new  algorithm  to  turn  a  regular  expression  into 
McNaughton  and  Yamada's  NFA.  In  Section  6  we  show  how  to  construct  a  compressed  form  of  this  NFA. 
Analysis  of  the  compressed  NFA  is  presented  in  Theorem  6,  which  is  our  main  theoretical  result.  In  section 
7,  we  show  how  to  further  compressed  our  NFA.  Section  8  discusses  experimental  results  showing  how  our 
compressed  NFA  compares  with  other  NFA's  in  solving  acceptance  testing  and  DFA  construction.  Section  9 
mentions  future  research. 

2      Terminology 

The  following  basic  definitions  and  terminology  can  be  found  in  [10].  By  an  alphabet  we  mean  a  finite 
nonempty  set  of  symbols.  If  E  is  an  alphabet,  then  E'  denotes  the  set  of  all  finite  strings  of  symbols  in  E.  If 
E  is  an  alphabet,  then  any  subset  of  E'  is  a  language  over  E.  If  Z-i  and  L2  are  two  languages,  then  the  cross 
product  L\  L2  —  {xy  :  x  £  L\,y  &  L2]  represents  the  set  of  all  strings  xy  that  result  from  concatenating 
each  I  G  ii  with  each  y  G  iz-  If  -^  stands  for  the  empty  string,  and  0  represents  the  empty  set,  then 
L  {A}  =  {A}  L  =  L,  and  L  0  =  0  L  =  0  for  any  language  L. 

Definition  1  Let  Lr  he  the  language  denoted  by  regular  expression  R.  Let  T,  be  a  finite  alphabet.  Then  the 
regular  expressions  are  the  smallest  set  of  terms  that  contains 

•  0  (which  represents  the  empty  set) 

•  A  (which  represents  the  set  {X}),  where  A  ts  the  empty  string 

•  a  (which  represents  the  set  {a})  for  each  symbol  a  6  E 

•  R\S  (which  represents  the  set  Lr  U  Ls),  where  R  and  S  are  regular  expressions 

•  RS  (which  represents  the  cross  product  set  LrLs),  where  R  and  S  are  regular  expressions 

•  R'  (which  represents  Ifp  S.{A}ULflS,  where  Ifp  X.E{X)  is  the  minimum  value  X  such  that  X  =  E{X)), 
where  R  is  a  regular  expression. 

A  nondeterministic  finite  automata  (abbr.  NFA)  AI  is  a  5-tuple  (E,(5, /,  F,  (§),  where  Q  is  a  set  of 
states,  /  C  Q  is  the  set  of  initial  states,  F  C  Q  is  the  set  of  final  states,  and  i5  C  Q  x  (E  x  Q)  is  a  labeled 
directed  graph  with  vertices  Q  and  an  edge  labeled  a  connecting  state  q  to  state  p  for  every  [9,[a,p]]  belonging 
to  6.  For  all  9  G  Q  and  a  G  E  we  use  the  notation  8{q,a)  to  denote  the  set  {p  :  [q,  [a,p]]  G  6}  of  all  states 


reachable  from  state  q  by  a  single  edge  labeled  'a\  It  is  useful  to  generalize  this  notation  with  the  following 
rules,  where  T  C  Q,  s  €  S*,  and  fl  C  E*: 

6(q,as)     =     6{6(q,a),s) 

6{T,s)     =     U,eT<5(9,s) 

6iT,B)     =     UeB6(r,6) 

The  language  L  accepted  by  M,  denoted  by  L{M),  is  defined  by  the  rule 

seL   *-*   6{I,s)nF^ill  (1) 

In  other  words,  L  =  {s  £ 'E,'\6{I,s)n  F  ^  It}.  Misa  deterministic  finite  automata  (abbr.  DFA)  if  graph 
6  has  no  more  than  one  edge  with  the  same  label  leading  out  from  each  vertex,  and  if  /  contains  exactly  one 
state.  Regular  expressions  and  NFA's  that  represent  the  same  regular  language  are  said  to  be  equivalent. 

3      Background 

Kleene  [1 1]  characterized  regular  languages  equivalently  in  terms  of  languages  denoted  by  regular  expressions 
and  languages  accepted  by  DFA's.  Rabin  and  Scott  [16]  showed  that  NFA's  also  characterize  the  regular 
languages,  and  their  work  led  to  algorithms  to  decide  whether  an  arbitrary  string  is  accepted  by  an  NFA. 

Let  n  be  the  number  of  NFA  states,  m  be  the  number  of  edges,  and  k  be  the  alphabet  size.  For  an  NFA 
represented  by  an  adjacency  matrix  of  size  n^  for  each  alphabet  symbol,  acceptance  testing  takes  0(n|i|) 
bit  vector  operations  and  0{n)  auxiliary  space.  Alternatively,  for  an  NFA  implemented  by  an  adjacency 
list  of  size  m  with  a  perfect  hash  table  [9]  storing  the  alphabet  symbols  at  each  state,  this  test  takes  time 
proportional  to  m\x\  in  the  worst  case.  For  DFA's  the  same  data  structure  leads  to  a  better  time  bound  of 
^(|x|).  However,  there  are  NFA's  for  which  the  smallest  equivalent  DFA  (unique  up  to  isomorphism  of  state 
labels  as  shown  by  Myhill  [14]  and  Nerode  [15])  has  an  exponentially  greater  number  of  states.  Thus,  the 
choice  between  using  an  NFA  or  DFA  is  a  space/time  tradeoff. 

There  are  two  main  approaches  for  turning  regular  expressions  into  equivalent  NFA's.  One,  due  to 
Thompson  [18],  constructs  an  NFA  (augmented  with  A  edges)  in  which  the  number  n  of  states  is  somewhere 
between  the  length  r  of  the  regular  expression  and  2r,  and  the  outdegree  of  any  state  is  no  greater  than  2. 
Consequently  m  =  0(n),  and  the  adjacency  list  implementation  does  not  even  require  perfect  hashing  to 
preserve  the  0(n|x|)  time  bound.  Thompson's  construction  is  a  simple,  bottom-up,  method  that  processes 
the  regular  expression  as  it  is  parsed.  The  time  and  space  is  linear  in  r. 

Another  approach,  based  on  Berry  and  Sethi's  [5]  improvement  to  McNaughton  and  Yamada  [13],  con- 
structs an  NFA  in  which  the  number  n  of  states  is  precisely  one  plus  the  number  s  of  occurrences  of  alphabet 
symbols  appearing  in  the  regular  expression.  In  general,  s  can  be  arbitrarily  smaller  than  n. 


For  the  bit  matrix  representation,  McNaughton  and  Yamada's  NFA  can  be  used  to  solve  acceptance 
testing  using  C)(s|a:|)  bit  vector  operations,  which  is  superior  to  the  time  bound  for  Thompson's  machine. 
With  the  adjacency  list  representation  the  worst  case  number  of  edges  m  =  17 (s^)  leads  to  a  worst  case 
time  bound  0(m|x|)  which  is  one  order  of  magnitude  worse  than  the  time  bound  for  Thompson's  machine. 
However,  the  fact  that  McNaughton  and  Yamada's  NFA  is  a  DFA  when  all  of  the  alphabet  symbols  are 
distinct  may  explain,  in  part,  why  it  is  observed  to  outperform  Thompson's  NFA  for  a  large  subclass  of  the 
instances.  The  Berry/Sethi  construction  scans  the  regular  expression  twice,  and,  with  only  a  little  effort, 
both  passes  can  be  made  to  run  in  linear  time  and  auxiliary  space  with  respect  to  r  plus  the  size  of  the  NFA 
(for  either  adjacency  list  or  matrix  implementations). 

There  is  one  main  approach  for  turning  NFA's  (constructed  by  either  of  the  two  methods  above)  into 
DFA's.  This  is  by  the  Rabin  and  Scott  subset  construction  [16]. 

4     McNaughton  and  Yamada's  NFA 

It  is  convenient  to  reformulate  McNaughton  and  Yamada's  transformation  from  regular  expressions  to 
NFA's[13]  in  the  following  way. 

Definition  2  A  normal  NFA  (abhr.  NNFA)  is  an  NFA  with  one  starting  state  qo  having  no  edges  leading 
into  it,  and  all  edges  leading  into  each  state  are  labeled  with  the  same  symbol.  For  an  NNFA  with  alphabet  E 
the  transition  map  is  represented  by  a  binary  edge  relation  6  C  Q  x  Q  and  assignment  A  :  {Q  —  {qo})  — *  S, 
where  A(q)  is  the  label  assigned  to  every  edge  leading  into  state  q. 

Definition  3  If  M -(T,,Q,qo,  F,6,  A)  is  an  NNFA,  then  tail(M)  =  (T,,Q  -  {qo),8{qo},F  -  {90},  {[?,<]  G 
^19  7^  90},^/ 

It  is  a  desirable  and  obvious  fact  (which  follows  immediately  from  the  definition  of  an  NNFA)  that  when 
A  is  one-to-one,  then  no  state  can  have  more  than  one  transition  with  the  same  label.  Hence,  such  an  NNFA 
is  a  DFA. 

We  can  implement  McNaughton  and  Yamada's  algorithm  to  turn  a  regular  expression  R  into  an  NNFA 
while  performing  a  single  left- to-right  shift/reduce  parse  of  R  (but  without  actually  producing  a  parse  tree). 
To  explain  how  this  is  done,  we  use  the  notational  convention  that  Mr  denotes  an  NFA  equivalent  to  regular 
expression  R.  Each  time  a  subexpression  S  oi  R\s  reduced  during  parsing,  tail{Ms)  is  computed,  where  Ms 
is  an  NNFA  equivalent  to  S.  The  last  step  computes  an  NNFA  Mr  from  tail(MR).  However,  Mr  cannot  be 
computed  from  taH{MR)  unless  we  know  whether  Mr  accepts  A,  which  indicates  whether  or  not  the  start 
state  for  Mr  is  a  final  state. 

Regular  expressions  are  restricted  if  0  is  not  a  subexpression.  There  is  a  linear  time  algorithm  to 
convert  regular  expressions  into  their  equivalent  restricted  forms.  Without  loss  of  generality,  we  will  assume 


throughout  this  paper  that  regular  expressions  are  restricted. 

Let  tiuUr  =  {A}  if  A  e  Lr;  otherwise,  let  uuUb  =  0-  If  tail{MR)  =  iT.,Q,  I,  F,6,  A),  and  go  ^  Q,  then 
the  following  formula 

Mr  =  i^,QU  {qo},{qo},  F  U  {{qo}nuUR),6U  {[qo,y]  -.ye  I},  A)  (2) 

indicates  how  to  compute  Mr  from  tail{MR)  and  nullR. 

Theorem  1  (McNaughton  and  Yamada)  Given  any  regular  expression  R  with  s  occurrences  of  alphabet 
symbols  from  T,,  we  can  construct  an  NNFA  Mr  with  s  +  I  states. 

Proof  The  proof  uses  structural  induction  to  show  that  for  any  regular  expression  R,  we  can  always  compute 
tail{MR)  and  uuUr  for  some  NNFA  Mr.  Then  equation  (2)  can  be  used  to  obtain  Mr.  We  assume  a  fixed 
alphabet  E.  There  are  two  base  cases,  which  are  easily  verified. 

tail{Mx)     =     (QA=0,6A=0,>lA  =  0,/A  =  0,n  =  0,nw//A  =  {A})  (3) 

tail{Ma)     =     (Qa  =  {9o}Ja  =  {9o},F,  =  {(/o},<5.  =  0,Aa  =  {[9o,a]},nu//<.  =  0),  (4) 

where  a  €  E,  and  qo  is  a  new  state 

To  use  induction,  we  assume  that  T  and  S  are  two  arbitrary  regular  expressions  equivalent  respectively  to 
NNFA's  Mt  and  Ms  with  tail^Mr)  =  {QT,h,FT,6T,AT)  and  tail{Ms)  =  {Qs,Is,Fs,6s,As),  where  Qt 
and  Qs  are  disjoint.  Then  we  can  easily  verify  that 

tail{MT\s)     =     {Qt\s  =  Qt^QsJt\s  -6t^6s,At\s  =  AtUAs,It\s  =  h^Is, 

Ft\s  =  Ft  U  Fs,nullT\s  =  nuUj  U  nulls)  (5) 

tail{MTs)     =     {Qts  =  Qt^Qs,Sts  =  6t^6sUFtIs,Ats  =  AtUAs, 

Its  =  /t  U  nullrls,  Fts  =  Fs  U  nullsFr,  nullrs  =  nullTnulls)  (6) 

tail(MT')     =     (Qt-  -  QtJt-  -  ^T^  FtIt,At'  =  At,It-  =  It,Ft- =  Ft, 

uuUt'  =  {A})  (7) 

Disjointness  of  the  unions  used  to  form  the  set  of  states  for  the  cases  T\S  and  TS  proves  the  assertion  about 
the  number  of  states.  We  can  convert  tail{MR)  into  Mr  using  formula  (2)     D 

The  proof  of  Theorem  1  leads  to  McNaughton  and  Yamada's  algorithm.  The  construction  of  label  map 
A  shows  that  when  all  of  the  occurrences  of  alphabet  symbols  appearing  in  the  regular  expression  contain 
distinct  symbols,  then  A  is  one-to-one.  In  this  case,  a  DFA  would  be  produced. 

Analysis  determines  that  this  algorithm  falls  short  of  optimal  performance,  because  the  operation  St  U 
FtIt  within  formula  (7)  for  tail(MT-)  is  not  disjoint;  all  other  unions  are  disjoint  and  can  be  implemented 


in  unit  time.  In  particular,  this  overlapping  union  makes  McNaughton  and  Yamada's  algorithm  use  time 
6(my/rnlogm)  to  transform  regular  expression 

(((((atla^rM-.-.a,)*  (8) 

into  an  NNFA  with  k  +  \  states  and  m  =  k"^  edges. 

5      Faster  NFA  Construction 

By  recognizing  the  overlapping  union  6tUFtIt  within  formula  (7)  for  tail(MT' )  as  the  source  of  inefficiency, 
we  can  maintain  invariant  nredr  =  FtIt  —  ^T  in  order  to  replace  the  overlapping  union  by  the  equivalent 
disjoint  union  6t  U  nredr-  In  order  to  maintain  nredr  as  a  component  of  the  tail  NNFA  computation  given 
above,  we  can  use  the  following  recursive  definition,  obtained  by  simplifying  expression  FrIr  —  6r  and  using 
the  rules  from  the  proof  of  Theorem  1. 

nredx  =     0  (9) 

nreda  =     Fala,where  a  G  E  (10) 

nredr\s  =     nredr  Onreds  U  Fr Is  ^  Fsir  (H) 

nredrs  =     Fsir  ^nuUsnredr  Unullrnreds  (12) 

nreds'  =0  (13) 

Rules  (9),  (10)  and  (13)  are  trivial.  Rule  (11)  follows  from  applying  distributive  laws  to  simplify  formula 

nredr\s  =  {Fr  U  Fs){Ir  U  Is)  -  (Sr  U  6s) 

Rule  (12)  is  obtained  by  applying  distributed  laws  to  simplify  formula, 

nredrs  =  {Fs  U  nullsFT){lT  U  nuUrh)  -  {h  U  65  U  Fris) 

Each  union  operation  is  disjoint  and,  hence,  0(1)  time  implementable.  However,  there  is  a  serious  loss 
of  efficiency  computing  cartesian  products  in  rules  (11)  and  (12).  Such  products  do  not  contribute  edges  to 
the  NNFA  for  regular  expressions  TS  when  these  products  belong  to  nredr  and  nulls  is  empty,  or  when 
they  belong  to  nreds  and  nullr  is  empty. 

To  overcome  this  problem  we  will  use  lazy  evaluation  to  compute  cartesian  products  only  when  they 
actually  contribute  edges  to  the  NNFA.  Thus,  instead  of  maintaining  a  union  nredR  of  cartesian  products, 
we  will  maintain  a  set  lazynredn  of  pairs  of  sets.  Consequently,  the  overlapping  union  6r  U  Frir  within 
formula  (7)  for  tail{Mr')  can  be  replaced  by 

h  U  iUlA,B]€lazynredr  ^  ^  B)  (14) 


However,  this  solution  creates  another  problem:  the  sets  forming  F  and  /,  which  are  computed  by  the 
rules  to  construct  the  tail  of  an  NNFA,  must  be  persistent  in  the  following  sense.  Let  the  sets  in  the  sequence 
forming  F  (respectively  7)  be  called  F  sets  (respectively  /  sets).  Each  F  set  (respectively  I  set)  could  be 
stored  as  a  first  (respectively  second)  component  of  a  pair  belonging  to  lazynred.  Given  any  such  pair,  we 
need  to  iterate  through  the  I  set  S  stored  in  the  second  component  of  the  pair  in  0(|5|)  time. 

The  sequence  of  F  (respectively  /)  sets  are  formed  by  two  operations:  1.  create  a  new  singleton  set; 
and  2.  form  a  new  set  by  taking  the  disjoint  union  of  two  previous  sets  in  the  sequence.  Clearly,  each  of 
these  sequences  can  be  stored  as  a  binary  forest  in  which  each  subtree  in  the  forest  represents  a  set  in  the 
sequence,  where  the  elements  of  the  set  are  stored  in  the  frontier.  By  construction  each  internal  node  in  the 
forest  has  two  children. 

We  call  the  forest  storing  the  F  sets  (respectively  I  sets)  the  F  forest  (respectively  I  forest).  For  each 
node  n  belonging  to  the  F  forest  (respectively  I  forest),  let  Fset{n)  (respectively  Iset{n))  denote  the  F  set 
(respectively  /  set)  represented  by  n. 

Each  node  in  the  F  and  I  forests  except  the  roots  stores  a  parent  pointer.  Each  node  n  in  the  I  forest 
also  stores  a  pointer  to  the  leftmost  leaf  of  the  subtree  rooted  in  n  and  a  pointer  to  the  rightmost  leaf  of  the 
subtree  rooted  n.  The  frontier  nodes  of  the  I  forest  are  linked. 

This  data  structure  preserves  the  unit-time  disjoint  union  for  F  and  I  sets,  and  supports  linear  time 
iteration  through  the  frontier  of  any  node  in  the  7  forest.  Since  all  the  F  sets  and  I  sets  are  subsets  of  the 
NFA  states  Q,  the  F  forest  and  7  forest  each  is  stored  in  0(|(5|)  space. 

Theorem  2  For  any  regular  expression  R  we  can  compute  lazynredn  in  time  0{r)  and  auxiliary  space 
0{s),  where  r  is  the  size  of  regular  expression  R,  and  s  is  the  number  of  occurrences  of  alphabet  symbols 
appearing  in  R. 

Proof  If  T  and  S  are  two  sets,  let  pair(T,S)  =  {[T,5]}  if  both  T  and  S  are  nonempty;  otherwise,  let 
pair{T,  5)  =  0.  The  proof  makes  use  of  the  following  recursive  definition  of  lazynreda  obtained  from  the 
recursive  definition  of  nred/j. 

lazynredx  =     0  (15) 

lazynreda  =     pair(Fa,  h),where  a  £  E  (16) 

lazynredT\s  =     lazynredr  U  lazynreds  l}pair{FT,  Is)^pair{Fs,  h)  (17) 

lazynredrs  =     pair(Fs,  lT)UnullslazynredT  Unullrlazynreds  (18) 

lazynreds-  =8  (19) 

Operation  pair(T,  S)  takes  unit  time  and  space.  Each  union  operation  occurring  in  the  rules  above  is  disjoint 
and,  hence,  implementable  in  unit  time.  Rule  (16)  contributes  unit  time  and  space  for  each  alphabet  symbol 
occurring  in  R,  or  0{s)  time  and  space  overall.  Rule  (17)  contributes  unit  time  for  each  alternation  operator 


appearing  in  R  or  0{r)  time  overall.  It  contributes  two  units  space  for  each  alternation  operator  both  of 
whose  alternands  contain  at  leeist  one  alphabet  symbol.  Hence,  the  overall  space  contributed  by  this  rule 
is  less  than  2s.  By  a  similar  argument,  Rule  (18)  contributes  0{r)  time  and  less  than  s  space  overall.  The 
other  two  rules  contribute  no  more  than  0(r)  time  overall.  Hence,  the  time  and  space  needed  to  compute 
lazynredn  is  0{r)  and  0(s)  respectively      D 

By  Theorems  1  and  2,  and  by  the  fact  that  nredR  can  be  computed  from  lazynredR  in  0(|nrec//j|)  time 
using  formula  (14),  we  have  our  first  theoretical  result. 

Theorem  3  For  any  regular  expression  R  we  can  compute  an  equivalent  NNFA  with  s  +  1  states  in  time 
0{r  +  m)  and  auxiliary  space  0(s),  where  r  is  the  size  of  regular  expression  R,  m  ts  the  number  of  edges  in 
the  NNFA,  and  s  is  the  number  of  occurrences  of  alphabet  symbols  appearing  in  R. 

6      Improving  Space  for  McNaughton  and  Yamada's  NFA 

Theorem  3  leads  to  a  new  algorithm  that  computes  the  adjacency  form  of  the  NNFA  in  a  single  left-to-right 
shift/reduce  parse  of  the  regular  expression  R.  Although  this  improves  upon  the  algorithm  of  Berry  and 
Sethi,  McNaughton  and  Yamada's  NNFA  hcis  certain  theoretical  disadvantages  over  simpler  Thompson's 
NFA.  Recall  from  example  (8)  that  the  number  of  edges  in  McNaughton  and  Yamada's  machine  can  be  the 
square  of  the  number  of  edges  in  Thompson's  machine  (since  Thompson's  NFA  has  m  =  0{n)).  Consequently, 
Thompson's  NFA  is  likely  to  be  more  desirable  in  time  and  space  for  DFA  construction  by  subset  construction 
when  the  adjacency  list  implementation  is  used.  We  also  believe  that  the  bit  vector  implementation  will 
rarely  be  more  desirable  than  the  compact  adjacency  list  implementation. 

Nevertheless,  we  can  modify  the  algorithm  just  given  so  that  in  0(r)  time  it  produces  an  0(s)  space 
compressed  NFA  that  encodes  McNaughton  and  Yamada's  NNFA,  and  that  supports  acceptance  testing  in 
0(s|z|)  time.  In  the  same  way  that  nredn  was  represented  more  compactly  as  lazynredn,  we  can  represent 
6f{,  which  is  a  union  of  cartesian  products,  as  a  set  lazySn  of  pairs  of  set- valued  arguments  of  these  products. 
If  Mr  is  the  NNFA  equivalent  to  regular  expression  R,  then  the  rules  for  tail{Mii)  are  given  just  below: 

lazy6,  =  0  (20) 

lazyS,  =  0  (21) 

lazy6T\s  =  lazyby  U  lazySs  (22) 

lazySrs  =  pair{FT ,  I s)  ^  lazydi  ^  lazy6 s  (23) 

lazySs'  =  lazySs  U  lazynreds  (24) 

After  the  preceding  rules  are  processed  we  can  obtain  a  representation  for  Mr  by  introducing  a  new  state 
qo  and  by  adding  pair{{qo},  Ir)  to  lazy6  in  accordance  with  formula  (2). 


Consequently,  if  T  is  a  subset  of  the  NFA  states  Q,  then  we  can  compute  the  collection  of  sets  6(T,  a)  for 
all  of  the  alphabet  symbols  a  G  E  as  follows.  First  we  compute 

finddomain(T)  =  {A'  :  [A'.V]  6  lazyd\Tr\X  i^  0} 
which  is  used  to  find  the  set  of  next  states 

ntxt^tatts{T)  =   {Y  :  [X,Y]  €  lazy6\X  £  finddomain{T)} 
Finally,  for  each  alphabet  symbol  a  £  E,  we  see  that 

S{T,a)  =  {q:Y  e  nextMates{T),q  G  Y\Aiq)  =  a] 

In  order  to  explain  how  lazyS  is  implemented,  we  will  use  some  additional  terminology.  For  each  F  set 
S  represented  by  node  n  in  the  F  forest,  n  stores  a  pointer  to  a  list  of  nodes  in  the  I  forest  representing 
set  {Y  :  [S,  Y]  G  lazy6}.  Furthermore,  the  F  and  /  forests  are  compressed  to  only  store  nodes  representing 
sets  that  appear  as  the  first  or  second  components  of  a  pair  [A',  Y]  G  lazyS.  In  other  words,  we  make  lazy6 
a  total  onto  binary  relation.  This  can  be  achieved  on-line  as  the  F  and  I  forests  are  constructed  by  a  kind 
of  path  compression  that  affects  the  preprocessing  time  and  space  by  no  more  than  a  small  constant  factor. 
Thus,  we  have 

Theorem  4  For  any  regular  expression  R,  its  equivalent  compressed  NFA,  consisting  of  F  forest,  I  forest 
and  lazyS,  takes  up  0{s)  space  and  can  be  computed  in  time  0{r)  and  auxiliary  space  0{s). 

Proof  Since  each  internal  node  in  the  F  and  /  forests  have  at  least  two  children,  and  since  their  leaves 
are  distinct  occurrences  of  alphabet  symbols,  they  take  up  0{s)  space.  Each  of  the  unions  in  the  rules  to 
compute  lazyS  is  disjoint,  and  hence  takes  unit  time.  By  same  argument  used  to  analyze  the  overall  space 
contributed  by  Rule  (18)  in  the  proof  of  Theorem  2,  we  see  that  Rule  (23)  contributes  0{s)  space  and  0(r) 
overall  to  lazyS.  By  Rule  (19),  Theorem  2,  and  a  simple  application  of  structural  induction,  we  also  see 
that  the  space  contributed  by  Rule  (24)  (which  results  from  adding  lazynred  to  lazyS)  overall  is  0{s).  The 
overall  time  bound  for  each  rule  is  easily  seen  to  be  0{r)     D 

The  compressed  NFA  also  supports  an  efficient  evaluation  of  the  three  preceding  queries  in  order  to 
simulate  transition  map  6.  The  best  previous  worst  case  time  bound  for  inputing  a  subset  T  of  states  and 
computing  the  collection  of  sets  6{T,a)  for  all  of  the  alphabet  symbols  a  G  E  is  0(|r|  x  |i5(T,  E)|)  using  an 
adjacency  list  implementation  of  McNaughton  and  Yamada's  NFA,  or  9(r)  using  Thompson's  NFA. 

In  Theorem  6  we  improve  this  bound,  and  obtain,  essentially,  optimal  asymptotic  time  without  exceeding 
0{s)  space.  This  is  our  main  theoretical  result.  It  explains  the  apparent  superior  performance  of  acceptance 
testing  using  our  compressed  NFA  over  Thompson's.  It  explains  more  convincingly  why  constructing  a  DFA 
starting  from  our  compressed  NFA  is  at  least  one  order  of  magnitude  faster  than  when  we  start  from  either 
Thompson's  or  McNaughton  and  Yamada's  NFA.  These  empirical  results  are  presented  in  section  8. 

10 


Before  proving  the  theorem,  we  will  first  prove  the  following  technical  lemma. 

Lemma  5  Let  T  be  a  set  of  states  m  the  compressed  NNFA  built  from  regular  expression  R,  and  lazydr  = 
{[A-,y]  :  [A',y]  e  lazyb\X  r\T  i^  ^ .   Then  {lazySr]  =  Oi\T\  +  \S{T,E)\). 

Proof  The  result  follows  from  proving  that  0{\T\  +  \6{T,  E)|)  is  a  bound  for  each  of  the  subsets  of  lazybi 
contributed  by  rules  (16),  (17),  (18),  and  (23)  respectively.  The  bound  holds  for  subsets  contributed  by  rules 
(16),  (17),  and  (23),  because  they  form  one-to-one  maps. 

The  proof  for  the  subset  contributed  by  (18)  is  split  into  two  cases.  For  convenience,  let  Tq  denote  the 
set  of  states  in  T  such  that  their  corresponding  symbol  occurrences  appear  in  regular  expression  Q,  where 
Q  is  a  subexpression  of/?.  First,  consider  the  set  A  of  pairs  [Fs,Iq\  G  lazySx  for  subexpressions  QS,  where 
Tq  =  0.  We  claim  that  these  edges  form  a  one-to-many  map,  which  implies  the  bound.  Suppose  this  were 
not  the  case.  Then  we  would  have  a  subexpression  QS,  and  a  subexpression  LP  of  Q  such  that  Lq  =  Li  and 
pairs  [Fs,  Lq]  and  [Fp,  II]  belonging  to  A.  However,  since  Q  contains  no  occurrence  of  an  alphabet  symbol 
in  T,  then  P  does  not  either.  Hence,  the  pair  [Fp,Ll\  cannot  belong  to  A.  Hence,  the  claim  holds. 

Next,  consider  the  set  B  of  pairs  \Fs,Lq\  6  lazybr  for  subexpressions  QS,  where  Tq  ^  0.  Proceeding 
from  inner-most  to  outer-most  subexpression  QS,  we  charge  each  pair  [Fs,  Iq]  6  5  to  an  uncharged  state 
in  Tq.  a  simple  structural  induction  would  show  that  Tq  contains  at  least  one  uncharged  state.  Let  LP 
be  an  inner-most  subexpression  in  R  such  that  Tl  is  nonempty,  and  \Fp,Li,]  6  lazySr  ■  Then  both  Ti  and 
Tp  contains  at  least  one  uncharged  state.  After  an  uncharged  state  in  Ti  is  charged,  Tlp  still  contains  an 
uncharged  state  from  Tp.  The  inductive  step  is  similar.  The  result  follows.      D 

Theorem  6  Given  any  subset  T  of  the  NNFA  states,  we  can  compute  all  of  the  sets  S{T,  a)  for  every  alphabet 
symbols  ae'^  m  time  0{\T\  +  \6{T,  Y,)\). 

Proof  The  sets  belonging  to  finddomain{T)  are  represented  by  all  the  nodes  Pt  along  the  paths  from  the 
states  belonging  to  T  to  the  roots  of  the  F  forest.  These  nodes  Pt  can  be  found  in  0{\T\  +  \Pt\)  time  by  a 
marked  traversal  of  parent  pointers  in  the  forest.  Observe  that  \Pt\  can  be  much  larger  than  \T\. 

Computing  next-states{T)  involves  two  steps.  First,  for  each  node  n  £  Px ,  we  traverse  a  nonempty  list 
of  nodes  in  the  /  forest  representing  {Y  :  [Fset(n),Y]  £  lazyS}.  This  step  takes  time  linear  in  the  sum  of 
the  lengths  of  these  lists.  (Observe  that  this  number  can  be  much  larger  than  IPrl-)  Second,  if  Dt  is  the 
set  of  all  nodes  in  the  /  forest  belonging  to  these  lists,  then  nextMates{T)  =  {Lset{n)  :  n  G  Dt}-  We 
can  compute  the  set  nextstates{T)  in  0{\{[Fset(n),Y]  :  n  G  PT,[Fset{n),Y]  G  lazy6}\)  time,  which  is 
0(|r|-|-  |(5(r,i;)|)  time  by  Lemma  5. 

Calculating  6{T,  E)  involves  computing  the  union  of  the  sets  belonging  to  next-states{T).  This  is  achieved 
in  0{\S(T,  E)|)  time  using  the  left  and  right  descendant  pointers  stored  in  each  node  belonging  to  Dt,  travers- 
ing the  unmarked  leaves  in  the  frontier,  and  marking  leaves  as  they  are  traversed.   Multiset  discrimination 
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[7]  can  be  used  to  separate  out  all  of  the  sets  {q  €  S(T,'i:)\A{q)  =  a]  for  each  a  G  S  in  time  0(|5(r,  E)|). 
D 

Consider  an  NFA  constructed  from  the  following  regular  expression: 


k  '', 


(A|(A|(...(A|a)-r)..rr 

In  order  to  follow  transitions  labeled  'a',  we  have  to  examine  0(n')  edges  and  0(n)  states  in  0(n^)  time 
for  McNaughton  and  Yamada's  NFA,  Q{kn)  states  and  edges  in  Q(kn)  time  for  Thompson's  machine,  and 
0(n)  states  and  edges  in  0(n)  time  for  our  compressed  NFA. 

7       Further  Optimization 

In  this  section,  we  introduce  a  simple  transformation  that  can  greatly  improve  the  compressed  NFA  repre- 
sentation. If  lazy6  contains  both  [R,U]  and  [5,  C/],  and  if  there  exists  an  F  set  T  =  i?  U  5,  then  we  can 
replace  [/?,(/]  and  [S,U]  within  lazyS  by  a  single  pair  [T,U].  Similarly,  If  lazy6  contains  both  [U,R]  and 
[U,S],  and  if  there  exists  an  I  set  T  =  i?U  S,  then  we  can  replace  [U,  R]  and  [U,  S]  within  lazy6  by  a  single 
pair  [U,T].  We  call  this  technique  packing.  In  a  single  linear  time  bottom  up  traversal  of  the  I  forests  and 
the  F  forests,  we  can  simplify  lazyS  by  packing.  In  the  case  of  regular  expression  (ai|a2|  ■  ■  On)'  packing  can 
simplify  lazyS  from  3n  —  2  pairs  into  a  single  pair,  (see  Figure  1.) 


Q- 


\ 


O 
O  O     ■  ■       O      ..O  O-  ■■•      ""O 

F-tree  I-tree 

Figure  1:  Compressed  NFA  of  (oilaol  •  -an)*  after  packing  F-sets  and  I-sets. 

At  the  same  time,  we  can  carry  out  the  same  kind  of  path  compression  described  in  Section  6,  so  that  the 
I  and  F  forests  only  contain  nodes  in  the  domain  (respectively  range)  oilazyS.  However,  whereas  previously 
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the  forest  leaves  (corresponding  to  NFA  states)  were  unaffected  by  compression,  the  packing  transformation 
can  remove  leaves  in  the  F  and  I  forests  from  the  domain  and  (respectively)  range  of  lazyS.  When  path 
compression  eliminates  leaves,  we  need  to  turn  the  symbol  assignment  map  A  into  a  multi-valued  mapping; 
that  is,  whenever  leaves  ql,...,qk  are  replaced  by  leaf  q,  we  take  the  following  steps; 

•  remove  the  old  leaves  ql,...,qk  from  the  domain  of  A; 

•  assign  the  set  of  symbols  {i  :  s  G  {ql, ...,  qk},  [s,  x]  G  A}  to  A  at  q. 

As  an  example  of  this,  consider  regular  expression  {ai\a2\  ■  ■  ■  a„y  once  again.  Path  compression  will 
turn  the  data  structure  shown  in  Figure  1  into  the  one  depicted  in  Figure  2.    In  using  our  compressed 

{ai,  a2,  •  •   On} 
ci  Ci 

F-tree  1"*^^^ 

Figure  2:  Compressed  NFA  of  (ai|a2|  •  •   an)'  after  Packing  and  Path  Compression. 

representation  to  simulate  an  NFA,  the  transition  edge  t  (see  Figure  2)  can  be  taken  only  if  the  current 
transition  symbol  belongs  to  {ai,a2,    ■an}  which  labels  node  C-[. 

Packing  and  path  compression  can  not  only  speedup  acceptance  testing  but  improve  DFA  construction 
dramatically.  In  the  remainder  of  this  paper,  we  call  our  optimized  compressed  NFA  representation  CNFA. 

8       Performance  Benchmark 

Experiments  to  benchmark  the  performance  of  the  CNFA  have  been  carried  out  for  a  range  of  regular  expres- 
sion patterns  against  a  number  of  machines  including  Thompson's  NFA,  an  optimized  form  of  Thompson's 
NFA,  and  McNaughton  and  Yamada's  NFA[13].  We  build  Thompson's  NFA  according  to  the  construction 
rules  described  in  [2].  Thompson's  NFA  usually  contains  excessively  redundant  states  and  A-edges.  However, 
to  our  knowledge  there  is  no  obvious/efficient  algorithm  to  optimize  Thompson's  NFA  without  blowing  up 
the  linear  space  constraint.  We  therefore  devise  some  simple  but  effective  transformations  that  eliminate 
redundant  states  and  edges  in  most  of  the  test  cases. 

Our  acceptance  testing  experiments  show  that  the  CNFA  outperforms  Thompson's  NFA,  Thompson's 
NFA  optimized,  and  McNaughton  and  Yamada's  NFA.  For  regular  expression  (a|6 •••)*,  the  CNFA  is  12 
times  faster  than  Thompson's  NFA,  2  times  faster  than  Thompson's  NFA  optimized,  and  50%  faster  than 
McNaughton  and  Yamada's  NFA.  For  regular  expression  ((a|A)(6|A)  ■•■)*.  which  is  equivalent  to  {a\b\  ■■•)*. 
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the  CNFA  is  16  times  faster  than  Thompson's  NFA,  8  times  faster  than  Thompson's  NFA  optimized,  and  50% 

faster  than  McNaughton  and  Yamada's  NFA.  For  regular  expression  ((a|A)(fc|A) )* ,  which  accepts  zero 

or  more  instances  of  an  ordered  string  followed  by  a  '-',  the  CNFA  is  2  times  faster  than  Thompson's  NFA, 
25%  faster  than  Thompson's  NFA  optimized,  but  80%  slower  than  McNaughton  and  Yamada's  NFA.  For 
((a|A)"-)",  the  CNFA  is  comparable  to  Thompson's  machine,  50%  slower  than  Thompson's  NFA  optimized, 
and  linearly  faster  than  McNaughton  and  Yamada's  NFA  ^  For  {abc  ■  ■  •)  and  (abc  ■  ■  •)*,  the  CNFA  is  75% 
slower  than  Thompson's  NFA  and  McNaughton  and  Yamada's  NFA,  and  55%  slower  than  Thompson's  NFA 
optimized.  However,  acceptance  testing  for  concatenation  is  quite  fast  for  each  of  the  NFA's  being  compared, 
and  would  not  degrade  our  speedup  ratio  in  general.  Acceptance  testing  with  a  realistic  programming 
language  pattern  shows  that  the  CNFA  is  7  times  faster  than  Thompson's  NFA,  60%  faster  than  Thompson's 
NFA  optimized,  and  2  times  faster  than  McNaughton  and  Yamada's  NFA. 

The  benchmark  for  subset  construction  is  more  favorable.  The  CNFA  outperforms  the  other  machines 
not  only  in  DFA  construction  time  but  also  in  constructed  machine  size.  Subset  construction  is  compared 
among  the  following  five  starting  machines:  the  CNFA,  Thompson's  NFA,  Thompson's  NFA  optimized, 
Thompson's  NFA  using  important-state  heuristic[2],  and  McNaughton  and  Yamada's  NFA.  Below  is  a  high 
level  modified  specification  of  the  cicissical  Rabin  and  Scott  subset  construction  for  producing  a  DFA  a  from 
an  NFA  6: 

a  :=  <d 

workset  :=  {{<?o}} 

while  3S  €  workset  do 

workset  :=  workset  -  {5} 

for  each  symbol  a  e  E  and  set  of  states  B  =  {x  e  6{S,  ^)\A{x)  =  a]  where  5  ^  0  do 
(T(S,a)  :=  B 
B  :=  €-closure(5) 
if  B  does  not  belong  to  a  then 
workset  :=  workset  LS{B} 
end  if 
end  for 
end  while 


We  implemented  the  preceding  specification  tailored  to  the  CNFA  and  other  machines.  The  only  differ- 
ences in  these  implementations  is  in  the  calculation  of  6(S,  E),  where  we  use  the  efficient  procedure  described 
by  Theorem  6,  and  in  the  f-closure  step,  which  is  performed  only  by  Thompson's  NFA.  The  CNFA  achieves 

'McNaughton  and  Yamada's  NFA  suffers  from  a  quadratic  number  of  edges  in  this  test  pattern. 
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linear  speedup  and  constructs  a  linearly  smaller  DFA  in  many  of  the  test  cases.  See  Figure  3  and  4  for  a 
benchmark  summary.  The  raw  timing  data  is  given  in  the  Appendix.  All  the  tests  described  in  this  paper 
are  performed  on  a  lightly  loaded  SUN  3/250  server.  We  used  getitimerO  and  setitimerO  primitives 
[19]  to  measure  program  execution  time.  It  is  interesting  to  note  that  the  CNFA  has  a  better  speedup  ratio 
on  SUN  Sparc  based  computers. 


pattern 

TNFA 

TNFA  (imp.  state) 

opt.  TNFA 

MYNFA 

(abc--r 

5  times  faster 

comparable 

comparable 

comparable 

(a|6|...)' 

quadratic  speedup 

line£ir  speedup 

linear  speedup 

lineeir  speedup 

(oii-.-ig)" 

70  times  faster 

10  times  faster 

20%  faster 

10  times  faster 

((a|A)(f,|A)----)- 

linear  speedup 

20%  faster 

linear  speedup 

5%  faster 

((a|A)(6|A)...)' 

quadratic  speedup 

linear  speedup 

quadratic  speedup 

linear  speedup 

(a|6)-a(a|!>)" 

2.5  times  feister 

comparable 

10%  slower 

50%  faster 

programming  language 

800  times  faster 

6  times  faster 

60%  faster 

6  times  faster 

Figure  3:  CNFA  Subset  Construction  Speedup  Ratio 


pattern 

TNFA 

TNFA  (imp.  state) 

opt.  TNFA 

MYNFA 

(abc--r 

comparable 

comparable 

comparable 

comparable 

{a\b\c---)' 

lineeirly  smaller 

linearly  smaller 

comparable 

linearly  smaller 

(0|l|---9)" 

200  times  smaller 

10  times  smaller 

comparable 

10  times  smaller 

((a|A)(6|A)...-)- 

3  times  smaller 

comparable 

comparable 

comparable 

((a|A)(6|A)...)* 

linearly  smaller 

lineairly  smaller 

linearly  smaller 

linearly  smaller 

(a|6)*a(a|6)" 

4  times  smaller 

comparable 

comparable 

comparable 

progremiming  language 

10  times  smaller 

5  times  smaller 

20%  larger 

5  times  smeJler 

Figure  4;  DFA  Size  Improvement  Ratio  Starting  from  the  CNFA 


9      Conclusion 

Theoretical  analysis  and  confirming  empirical  evidence  demonstrates  that  our  proposed  CNFA  leads  to  a 
substantially  more  efficient  way  of  turning  regular  expressions  into  DFA's  (and  minimum  state  DFA's  in 
particular)  than  other  NFA's  in  current  use.  It  would  be  interesting  future  research  to  analyze  the  effect  of 
packing  and  path  compression  on  the  CNFA.  It  would  also  be  worthwhile  to  obtain  a  sharper  analysis  of  the 
constant  factors  in  comparing  the  CNFA  with  other  NFA's. 
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APPENDIX:  Benchmark  Results  ^ 

Acceptance  Testing  Benchmark 

(abc---) 


length 

TNFA 

TNFA 

unopt 
CNFA 

CNFA 

MYNFA 

TNFA    V. 
CNFA 

opt.    TNFA   ve 
CNFA 

MYNFA    v« 
CNFA 

unopt-    CNFA    vs 
CNFA 

1000 

0,14 

0  34 

0  58 

0  76 

0.18 

0.18 

0.45 

0,34 

0,76 

1500 

0,20 

0  52 

1.00 

1    18 

0.30 

0.18 

0.44 

0,25 

0,84 

3000 

0  30 

0  74 

1.33 

1-58 

0.43 

0.19 

0.47 

0.27 

0  64 

3500 

0.36 

0  90 

1.60 

3.00 

0.54 

0.19 

0,45 

0.27 

0-80 

3000 

0  44 

112 

3.00 

3  44 

0.66 

0.18 

0,46 

0,27 

082 

4000 

0,64 

1   54 

2.58 

3.32 

0  64 

0.19 

0,46 

0,25 

0  78 

4300 

0,73 

1,56 

3.76 

3-43 

0-96 

0.31 

0,46 

036 

0,83 

5000 

0  70 

1   46 

3.66 

3  56 

0.92 

0.20 

0,42 

0,26 

0,81 

(abc-)' 


l«Dg1b 

TNFA 

TNFA 

anopt 
CNFA 

CNFA 

MYNFA 

TNFA    V. 
CNFA 

opt-    TNFA    V. 
CNFA 

MYNFA   vs 
CNFA 

unopl,    CNFA    vb 
CNFA 

10 

0,33 

0,63 

110 

1,56 

0.33 

0.31 

0.40 

0,21 

0.71 

30 

0,36 

0,60 

113 

1   44 

0,34 

0   18 

0,43 

0.24 

0.78 

30 

0  38 

0  64 

1.13 

1   42 

0,36 

0.30 

0.45 

0,25 

0  79 

40 

0  28 

0  64 

1.08 

1   44 

0,34 

0,19 

0.44 

'^,34 

0-75 

50 

0  36 

0  64 

1.08 

I-4B 

0,38 

0,18 

0.43 

0,26 

0.73 

60 

0.28 

0,64 

1.13 

1-50 

0,34 

0,19 

0.43 

0,33 

0.75 

70 

0,36 

0  66 

1    10 

1-46 

0.34 

0,18 

0.45 

0,33 

0  75 

60 

0,36 

0,64 

1,10 

1-46 

0  32 

0,18 

0.44 

0.22 

0  75 

90 

0,38 

0.64 

1,14 

1.46 

0,36 

0   19 

0.44 

0-25 

0  78 

100 

0,36 

0  64 

1,10 

1   46 

0-34 

0   18 

0.44 

0.23 

0  75 

{a\b\c--r 


length 

TNFA 

TNFA 

unopt, 
CNFA 

CNFA 

MYNFA 

TNFA    Vi 
CNFA 

opt     TNFA   va 
CNFA 

MYNFA   VB 

CNFA 

unopt     CNFA    vs 
CNFA 

10 

5-46 

1,36 

4,30 

1   76 

0,62 

3.10 

0,77 

0  47 

2,39 

20 

10-46 

3   18 

7,53 

3.02 

1   36 

5  19 

1,08 

0.66 

3.73 

30 

15.70 

3,04 

10,86 

3.18 

1.66 

4,65 

1.39 

0.85 

4.96 

40 

21.16 

3  76 

14  26 

3.56 

3-42 

8  37 

1.47 

0.95 

5-58 

50 

26,32 

4   60 

17.26 

3  64 

3  00 

9  33 

1-63 

1   06 

6  06 

60 

31-62 

5  46 

33-56 

3.13 

3.66 

10, 13 

1.75 

1.17 

7  33 

70 

36  62 

6,30 

33.94 

3  36 

4.36 

11, 33 

1-90 

134 

7,34 

80 

42,02 

7   13 

27.36 

3  56 

5  33 

11.34 

3  00 

1,47 

7  69 

©0 

47  94 

7-92 

30  44 

3.90 

6,00 

13, 39 

3.03 

1.54 

7,81 

100 

53,00 

6  70 

35   10 

4-10 

6,88 

12  68 

2-13 

1.68 

8.56 

^  AU  tests  are  performed  on  a  SUN  3/250  server.  Benchmark  time  is  in  seconds. 
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iia\xr-r 


Unsth 

TNFA 

TNFA 

anopt 
CNFA 

CNFA 

MYNFA 

TNFA    V. 
CNFA 

opt.    TNFA    vt 
CNFA 

MYNFA    vi 
CNFA 

unopt     CNFA    vb 
CNFA 

10 

7.14 

4   30 

5.50 

6.10 

3.96 

0  88 

0.53 

0  49 

0.70 

20 

13  94 

7.14 

9.14 

13.76 

12.14 

0.94 

0.52 

0-S6 

0.69 

30 

19  90 

10.60 

13.76 

20  74 

26   12 

0-96 

0.51 

1.36 

0.66 

40 

35.92 

13.90 

17.06 

36.33 

42.16 

0.99 

0.49 

161 

0.65 

50 

31.46 

16-82 

33.36 

34. 26 

66.54 

0.92 

0.49 

1   94 

0  65 

60 

36.10 

16-96 

34.96 

39.76 

91.74 

0-91 

0.46 

2.31 

0  63 

70 

43.56 

32-96 

39,54 

46. 04 

127.36 

0-95 

0.50 

3,76 

0  64 

eo 

51,18 

35-96 

35.20 

53.60 

171,02 

0.95 

0.46 

3.19 

0  66 

90 

52  66 

36  80 

35.54 

54.34 

187.56 

1.01 

0.51 

3,46 

0.66 

100 

61-30 

31.12 

41.00 

63-44 

248.04 

0.97 

0.49 

3.91 

0.65 

((a|A)(6|A).-.r 


length 

TNFA 

TNFA 

unopt 
CNFA 

CNFA 

MYNFA 

TNFA   VI 
CNFA 

opt     TNFA   V. 
CNFA 

MYNFA   v» 
CNFA 

unopt.    CNFA    v» 
CNFA 

10 

7,08 

3 

92 

4.26 

1.94 

0.86 

3  56 

2.02 

0.44 

2.20 

30 

13.06 

7 

42 

7.60 

2.06 

1.38 

6.34 

3.60 

0.67 

3-69 

30 

19.92 

10 

96 

10.78 

3.30 

1  92 

8  66 

4.77 

0-63 

4  69 

40 

36.32 

14 

38 

14.16 

3.53 

2  44 

10  44 

5.71 

0  97 

5.62 

50 

32  32 

18 

00 

17.34 

3  84 

3.00 

11.36 

6  34 

1  06 

6   11 

60 

37  68 

21 

66 

30.78 

3  10 

3.66 

13  15 

6.99 

1.18 

6.70 

70 

43  82 

25 

12 

24-06 

3.24 

4   34 

13  52 

7  75 

1   34 

7.42 

80 

51.54 

28 

48 

27  78 

3.58 

5   18 

14.40 

7-96 

1   45 

7.75 

90 

57.80 

32 

08 

30.60 

3.86 

5.90 

14  89 

8  27 

1.52 

7.94 

100 

64   56 

35 

46 

33.96 

4  06 

6  86 

15.90 

6.73 

1.69 

8.37 

((a|A)(6|A)  ••■-)• 


length 

TNFA 

TNFA 

QDOpt 

CNFA 

CNFA 

MYNFA 

TNFA    v» 
CNFA 

opt-    TNFA    vt 
CNFA 

MYNFA  VB 

CNFA 

unopt.    CNFA    vb 
CNFA 

10 

4.40 

3  82 

2  66 

3  36 

0-66 

1-31 

0.64 

0.20 

0.79 

20 

6.06 

4,94 

4.08 

4.86 

1.00 

1-73 

1.02 

0  21 

0.83 

30 

13-30 

7,16 

5.50 

6-54 

1-34 

1.88 

1.09 

0-20 

0.64 

40 

16  06 

9, 32 

6  72 

7.96 

1.64 

3.01 

1.16 

0.31 

0-87 

60 

19.22 

11.34 

6.10 

9.34 

1.86 

3.05 

1.20 

0.30 

0.87 

60 

33  46 

13  90 

9-60 

11.04 

3.38 

3.13 

1.26 

0.23 

0.89 

70 

37  32 

16   18 

11.32 

12  68 

2. 64 

3.15 

1-26 

0.22 

0-89 

80 

31.90 

18-16 

12.72 

14.34 

3.16 

3.32 

1.27 

0.22 

0.69 

90 

34.60 

19-92 

13  64 

15.34 

3  40 

2.37 

1   30 

0  22 

0.69 

100 

36  98 

22.04 

14.96 

18.50 

3-76 

2.10 

1    19 

0  20 

0,81 
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Subset  Construction  Benchmark 


(abc--)' 


Construction  Time 


length 

TNFA 

TNFA 
(imp  >t&t«) 

opl.  TNFA 

UDopt.  CNFA 

CNFA 

MYNFA 

TNFA  (imp-  Hile)  vb 
CNFA 

Opt.  TNFA  vt 
CNFA 

MYNFA  V. 
CNFA 

100 

0.04 

0  03 

0.03 

0.04 

0  03 

0.03 

1.00 

1.00 

1.00 

300 

0  13 

0.04 

0.06 

0.06 

0,04 

0.06 

1.00 

1,50 

1.50 

300 

0.16 

0.06 

0.04 

0.06 

0.04 

0.08 

1.50 

1,00 

3,00 

400 

0  38 

0.10 

0.08 

0.08 

0-10 

0.08 

1.00 

0.80 

0.80 

500 

0  40 

0.13 

0,10 

0.13 

0-13 

0.10 

1,00 

0,83 

0.83 

600 

0.56 

0.13 

0.14 

0.14 

0.14 

0.14 

0.86 

1,00 

1.00 

700 

0.74 

0  14 

0.18 

0.14 

0.14 

0.30 

1,00 

1.29 

1.43 

800 

1.04 

0  30 

0.30 

030 

0,18 

0.34 

1.11 

1.11 

1.33 

900 

1.13 

0.18 

0.33 

0.33 

0,18 

0.22 

1.00 

1.23 

1.33 

1000 

1-73 

0.36 

0.30 

0.30 

0.33 

0,33 

118 

0-91 

1-00 

Constructed  DFA  Size 


m&chiQC 

length 

nod«  no 

edge  no 

node  weight 

edge  weight 

length 

node  no 

edge  DO     n 

ode  weight 

edge  weight 

TNFA 

100 

101 

101 

106 

103 

300 

301 

201 

205 

303 

TNFA  (imp.  itatc) 

100 

101 

101 

101 

101 

300 

301 

301 

201 

201 

opt-  TNFA 

100 

100 

100 

100 

100 

300 

300 

300 

300 

300 

unopt  CNFA 

100 

101 

101 

101 

101 

300 

301 

301 

201 

301 

CNFA 

100 

101 

101 

101 

101 

200 

301 

201 

301 

301 

MYNFA 

100 

101 

101 

101 

101 

300 

301 

201 

301 

301 

TNFA 

300 

301 

301 

305 

303 

400 

401 

401 

405 

403 

TNFA  (imp  ttklc) 

300 

301 

301 

301 

301 

400 

401 

401 

401 

401 

opt-  TNFA 

300 

300 

300 

300 

300 

400 

400 

400 

400 

400 

unopt  CNFA 

300 

301 

301 

301 

301 

400 

401 

401 

401 

401 

CNFA 

300 

301 

301 

301 

301 

400 

401 

401 

401 

401 

MYNFA 

300 

301 

301 

301 

301 

400 

401 

401 

401 

401 

TNFA 

500 

501 

501 

605 

503 

600 

601 

601 

605 

603 

TNFA  (imp  tiAte) 

500 

501 

501 

501 

501 

600 

601 

601 

601 

601 

opt  TNFA 

500 

500 

500 

500 

500 

600 

600 

600 

600 

600 

unopt  CNFA 

500 

501 

501 

501 

501 

600 

601 

601 

601 

601 

CNFA 

500 

501 

501 

501 

501 

600 

601 

601 

601 

601 

MYNFA 

500 

501 

501 

501 

501 

600 

601 

601 

601 

601 

TNFA 

700 

701 

701 

705 

703 

800 

801 

801 

805 

803 

TNFA  (imp  •l«te) 

700 

701 

701 

701 

701 

800 

801 

801 

801 

601 

opt  TNFA 

700 

700 

700 

700 

700 

800 

800 

800 

800 

800 

unopt  CNFA 

700 

701 

701 

701 

701 

800 

801 

601 

601 

601 

CNFA 

700 

701 

701 

701 

701 

600 

801 

801 

801 

601 

MYNFA 

700 

701 

TOl 

701 

701 

800 

801 

801 

801 

801 

TNFA 

900 

901 

901 

905 

903 

1000 

1001 

1001 

1005 

1003 

TNFA  fimp  •\m\t) 

900 

901 

901 

901 

901 

1000 

1001 

1001 

1005 

1003 

opt  TNFA 

900 

900 

900 

900 

900 

1000 

1000 

1000 

1000 

1000 

unopt  CNFA 

900 

901 

901 

901 

901 

1000 

1001 

1001 

1005 

1003 

CNFA 

900 

901 

901 

901 

901 

1000 

1001 

1001 

1005 

1003 

MYNFA 

900 

901 

901 

901 

901 

1000 

1001 

1001 

1005 

1003 

20 


ia\b\c--r 


Construction  Time 


length 

TNFA 

TNFA 
(imp.   .tate) 

opt.    TNFA 

unopt.   CNFA 

CNFA 

MYNFA 

TNFA   (imp,   .tdtc)   vs 
CNFA 

opt     TNFA    V3 
CNFA 

MYNFA   VB 
CNFA 

10 

0  08 

0.02 

0.00 

0.02 

0  00 

0.02 

- 

1-00 

- 

20 

0.92 

0,06 

0.00 

006 

0.02 

0.04 

3.00 

0-00 

2  00 

30 

4  44 

0.08 

0.02 

O.OB 

0.02 

0.08 

4.00 

1.00 

4.00 

40 

13.70 

0.14 

000 

0.16 

0.00 

0.16 

- 

1.00 

- 

50 

29.94 

0  24 

0.02 

0.24 

0.02 

0.26 

12. 00 

1.00 

14  00 

60 

61.18 

0   46 

0.03 

0.36 

0.02 

0.30 

23.00 

1.00 

15,00 

70 

111.68 

0.50 

0.04 

0.46 

0.02 

0-46 

25  00 

3.00 

33  00 

80 

168  74 

0-64 

0-04 

0.54 

0  02 

0-58 

33  00 

3.00 

29,00 

90 

300,73 

0.76 

006 

0.78 

0.02 

0.74 

39-00 

3.00 

37.00 

100 

450.90 

0.96 

006 

0  94 

0  02 

0.92 

46.00 

3.00 

46  00 

Constructed  DFA  Size 


m&chine 

length 

node   no 

edge   no 

node  weight 

edge  weight 

length 

node  DO 

edge  no 

node  weight 

edge  weight 

TNFA 

10 

11 

110 

385 

3904 

30 

31 

430 

1070 

21609 

TNFA    (imp,   *l»te) 

10 

11 

110 

11 

110 

30 

31 

420 

21 

420 

opt.   TNFA 

10 

1 

10 

1 

10 

30 

1 

20 

20 

unopt.    CNFA 

10 

11 

110 

11 

110 

20 

21 

420 

21 

420 

CNFA 

10 

2 

20 

2 

20 

20 

2 

40 

40 

MYNFA 

10 

11 

110 

11 

110 

20 

21 

420 

21 

420 

TNFA 

30 

31 

930 

2355 

71114 

40 

41 

1640 

4140 

166419 

TNFA   (imp.    >iate) 

30 

31 

930 

31 

930 

40 

41 

1640 

41 

1640 

opt,    TNFA 

30 

1 

30 

1 

30 

40 

1 

40 

40 

unopt.    CNFA 

30 

31 

930 

31 

930 

40 

41 

1640 

41 

1640 

CNFA 

30 

2 

60 

3 

60 

40 

2 

80 

80 

MYNFA 

30 

31 

930 

31 

930 

40 

41 

1640 

41 

1640 

TNFA 

50 

51 

3550 

6425 

322524 

60 

61 

3660 

9210 

554429 

TNFA   (imp.   «tate) 

50 

51 

3550 

51 

2550 

60 

61 

3660 

61 

3660 

opt,    TNFA 

50 

1 

50 

1 

50 

60 

1 

60 

60 

nnopl.   CNFA 

50 

51 

3550 

51 

2550 

60 

61 

3660 

61 

3660 

CNFA 

50 

2 

100 

2 

100 

60 

3 

120 

120 

MYNFA 

50 

51 

2550 

51 

2550 

60 

61 

3660 

61 

3660 

TNFA 

70 

71 

4970 

13495 

877134 

80 

81 

6480 

16280 

1305639 

TNFA    (imp     •tale) 

70 

71 

4970 

71 

4970 

80 

81 

6480 

81 

6480 

opt     TNFA 

70 

1 

70 

1 

70 

80 

1 

80 

80 

unopt.    CNFA 

TO 

71 

4970 

71 

4970 

80 

81 

6480 

81 

6480 

CNFA 

70 

2 

140 

3 

140 

80 

3 

160 

160 

MYNFA 

70 

71 

4970 

71 

4970 

80 

61 

6460 

81 

6480 

TNFA 

90 

91 

8190 

2DS65 

1654944 

100 

101 

10100 

25350 

2540049 

TNFA   (imp     .tale) 

90 

91 

8190 

91 

8190 

100 

101 

10100 

101 

10100 

opt     TNFA 

90 

1 

90 

1 

90 

100 

1 

100 

100 

unopt.    CNFA 

90 

91 

8190 

91 

8190 

100 

101 

10100 

101 

10100 

CNFA 

90 

2 

160 

2 

180 

100 

2 

200 

300 

MYNFA 

90 

91 

8190 

91 

8190 

100 

101 

10100 

101 

10100 

21 


(0|l|--9)" 


Construction  Time 


length 

TNFA 

TNFA 
(it&p  Bt&te) 

opl-  TNFA 

ODopt.  CNFA 

CNFA 

MYNFA 

TNFA  (imp.  •t*te)  v« 
CNFA 

opt.  TNFA  vi 
CNFA 

MYNFA  VB 
CNFA 

35 

1  54 

0  38 

004 

020 

0  03 

0,36 

14-00 

2  00 

13.00 

50 

3  16 

0  48 

0.06 

0.44 

0  04 

0,53 

12-00 

1,50 

13  00 

75 

•1  63 

0  78 

0  06 

0.68 

0  06 

0,80 

13  00 

1,33 

13,33 

100 

6.50 

1  03 

0-13 

0.90 

0.13 

1,00 

8  50 

1-00 

8.33 

135 

8  16 

1.36 

0.14 

1.08 

0  10 

1,30 

12,60 

1,40 

12.00 

150 

9-80 

1.60 

0.16 

1.36 

0  13 

1,54 

13  33 

1.50 

12.83 

175 

11  43 

1  70 

0  20 

1.46 

0.14 

1,73 

12  14 

1.43 

12.29 

300 

13.30 

2  06 

0  34 

1  74 

0  18 

1  94 

11  44 

1.33 

10.78 

335 

14  98 

3.34 

0.34 

2. 28 

0  20 

3,32 

11  70 

1.20 

11.60 

350 

16,30 

2.58 

0.28 

2  52 

0.33 

2-70 

11.73 

1-37 

12.27 

Constructed  DFA  Size 


mftchioe 

length 

node  DO 

edge  no 

Dode  weight 

edge  weight 

length 

node  DO 

edge  DO 

node  weight 

edge  weight 

TNFA 

35 

251 

3410 

5039 

57004 

50 

501 

4910 

12039 

116004 

TNFA  (imp  •Iftte) 

35 

351 

2410 

351 

2410 

50 

501 

4910 

501 

4910 

opt  TNFA 

35 

26 

350 

26 

350 

50 

51 

500 

51 

500 

unopt.  CNFA 

35 

351 

2410 

351 

3410 

50 

501 

4910 

501 

4910 

CNFA 

25 

36 

250 

36 

350 

50 

51 

500 

51 

500 

MYNFA 

25 

351 

3410 

251 

2410 

50 

501 

4910 

501 

4910 

TNFA 

75 

751 

7410 

16139 

179004 

100 

1001 

9910 

24239 

340004 

TNFA  (imp  itMe) 

75 

751 

7410 

751 

7410 

100 

1001 

9910 

1001 

9910 

opt.  TNFA 

75 

76 

750 

76 

750 

100 

101 

1000 

101 

1000 

unopt  CNFA 

75 

751 

7410 

751 

7410 

100 

1001 

9910 

1001 

9910 

CNFA 

75 

76 

750 

76 

750 

100 

101 

1000 

101 

1000 

M  Y  N  FA 

75 

751 

7410 

751 

7410 

100 

1001 

9910 

1001 

9910 

TNFA 

135 

1351 

13410 

30339 

301004 

150 

1501 

14910 

36439 

363004 

TNFA  (imp  ilite) 

135 

1351 

13410 

1251 

12410 

150 

1501 

14910 

1501 

14910 

opt  TNFA 

135 

136 

1350 

126 

1250 

150 

151 

1500 

151 

1500 

unopt  CNFA 

135 

1351 

13410 

1251 

12410 

150 

1501 

14910 

1501 

14910 

CNFA 

135 

126 

1350 

126 

1350 

150 

151 

1500 

151 

1500 

MYNFA 

135 

1351 

13410 

1251 

13410 

150 

1501 

14910 

1501 

14910 

TNFA 

175 

1751 

17410 

42539 

433004 

300 

2001 

19910 

4S639 

464004 

TNFA  (imp  •t»te) 

175 

1751 

17410 

1751 

17410 

200 

3001 

19910 

2001 

19910 

opt  TNFA 

175 

176 

1750 

176 

1750 

200 

301 

2000 

301 

2000 

unopt  CNFA 

175 

1751 

17410 

1751 

17410 

300 

3001 

19910 

3001 

19910 

CNFA 

175 

176 

1750 

176 

1750 

300 

301 

3000 

301 

2000 

MYNFA 

ITS 

1751 

17410 

1751 

17410 

200 

3001 

19910 

2001 

19910 

TNFA 

225 

2351 

33410 

54739 

545004 

350 

3501 

34910 

60639 

606004 

TNFA  (imp  •t«te) 

225 

3351 

33410 

2251 

33410 

350 

2501 

34910 

3501 

34910 

opt  TNFA 

225 

336 

3350 

326 

3350 

350 

351 

3500 

351 

3500 

unopt  CNFA 

225 

2251 

33410 

3351 

23410 

250 

3501 

34910 

3501 

24910 

CNFA 

225 

226 

3350 

336 

3350 

350 

351 

3500 

351 

2500 

MYNFA 

325 

2251 

22410 

3351 

33410 

350 

3501 

34910 

3501 

24910 

22 


((a|A)(6|A)  ••■-)• 


Construction  Time 


length 

TNFA 

TNFA 
(imp  atftte) 

opt  TNFA 

onopt.  CNFA 

CNFA 

MYNFA 

TNFA  (imp.  atate)  v« 
CNFA 

opt.  TNFA  vi 
CNFA 

MYNFA  vt 
CNFA 

as 

0.34 

0.04 

0.10 

0.04 

0.04 

0,04 

1,00 

2.50 

1.00 

50 

3,86 

0.14 

0-40 

0-12 

0,12 

0.14 

1.17 

3.33 

1.17 

75 

18-16 

0.30 

1.34 

0.24 

0.28 

0  28 

1.07 

4.79 

1  00 

100 

55.50 

0.48 

2.90 

0  44 

0.46 

0-50 

1.04 

6.30 

1.09 

125 

132.52 

0  80 

5.72 

0.66 

076 

0-74 

1,05 

7,53 

0.97 

150 

270,34 

1.14 

9.24 

0.96 

1.02 

1.06 

112 

9,06 

1.04 

175 

496.94 

1-54 

14.64 

1.26 

1-38 

1,50 

1.12 

10,61 

1,09 

200 

839.94 

2.04 

20.94 

1.62 

1,76 

1,88 

1.16 

11.90 

1.07 

225 

1392.42 

3.20 

33.02 

2.12 

2,40 

2-44 

1,33 

13.34 

1.02 

250 

2065.14 

3.16 

41.62 

2.76 

2.74 

2,90 

1.15 

15,19 

1.06 

Constructed  DFA  Size 


m&chine 

length 

node  no 

edge  no 

node  weight 

edge  weight 

length 

node  no 

edge  no 

node  weight 

edge  weight 

TNFA 

25 

27 

351 

1027 

8476 

50 

52 

1281 

3928 

65076 

TNFA  (imp,  .tAte) 

25 

27 

351 

27 

351 

50 

52 

1261 

53 

1326 

opt.  TNFA 

25 

27 

351 

27 

351 

50 

52 

1281 

53 

1326 

unopt.  CNFA 

25 

27 

351 

37 

351 

50 

52 

1281 

53 

1326 

CNFA 

25 

27 

351 

27 

351 

50 

52 

1361 

53 

1326 

MYNFA 

25 

37 

351 

37 

351 

50 

52 

1361 

53 

1326 

TNFA 

75 

77 

3881 

8703 

316676 

100 

103 

5106 

15353 

610151 

TNFA  (imp.  itite) 

75 

77 

3881 

78 

3936 

100 

103 

5106 

103 

5151 

opt.  TNFA 

75 

77 

2881 

78 

2936 

100 

102 

5106 

103 

5151 

anopt,  CNFA 

75 

77 

2881 

78 

2926 

100 

102 

5106 

103 

5151 

CNFA 

75 

77 

3881 

78 

3936 

100 

103 

5106 

103 

5151 

MYNFA 

75 

77 

2881 

78 

3926 

100 

103 

5106 

103 

5151 

TNFA 

125 

137 

7956 

23878 

992376 

150 

152 

11431 

34376 

1710226 

TNFA  (imp,  itatc) 

125 

127 

7956 

126 

8001 

150 

152 

11431 

153 

11476 

opt  TNFA 

135 

137 

7956 

138 

8001 

150 

152 

11431 

153 

11476 

unopt,  CNFA 

135 

137 

7956 

138 

8001 

150 

152 

11431 

153 

11476 

CNFA 

125 

127 

7956 

136 

8001 

150 

153 

11431 

153 

11476 

MYNFA 

125 

127 

7956 

136 

8001 

150 

153 

11431 

153 

11476 

TNFA 

175 

177 

15531 

46553 

3710576 

200 

202 

20256 

60703 

4040301 

TNFA  (imp.  itAte) 

175 

177 

15531 

178 

15576 

300 

202 

20256 

203 

30301 

opt,  TNFA 

175 

177 

15531 

178 

15576 

300 

303 

20356 

203 

30301 

unopt.  CNFA 

175 

177 

15531 

176 

15576 

300 

303 

30356 

203 

20301 

CNFA 

175 

177 

15531 

176 

15576 

200 

303 

30356 

203 

20301 

MYNFA 

175 

177 

15531 

176 

15576 

200 

202 

20356 

303 

20301 

TNFA 

335 

227 

35606 

76728 

5746376 

250 

252 

31581 

94636 

7875376 

TNFA  (imp  state) 

225 

227 

35606 

226 

35651 

250 

353 

31581 

253 

31626 

opt  TNFA 

335 

237 

35606 

238 

25651 

350 

353 

31581 

253 

31636 

unopt  CNFA 

235 

227 

35606 

326 

25651 

350 

252 

31581 

253 

31636 

CNFA 

325 

337 

35606 

338 

25651 

250 

362 

31561 

353 

31636 

MYNFA 

225 

237 

25606 

336 

35651 

350 

352 

31581 

353 

31626 

23 


((a|A)(6|A)---)* 


Construction  Time 


Icoglh 

TNFA 

TNFA 
(imp.    •tftle) 

opl     TNFA 

UDopt.    CNFA 

CNFA 

MYNFA 

TNFA   (imp-   ttate)   vb 
CNFA 

opl.    TNFA    V9 
CNFA 

MYNFA   V. 
CNFA 

10 

0.14 

0  03 

0,04 

0.03 

0-00 

0.02 

30 

1   44 

0.04 

0,14 

0.04 

0  00 

0.06 

30 

6  34 

0  10 

0-38 

0.10 

0.03 

0.06 

5.00 

14-00 

3.00 

40 

19,30 

0  16 

0-58 

014 

0.00 

0.16 

- 

- 

50 

4S  64 

0  36 

110 

0.33 

0.03 

0.34 

13  00 

55.00 

13.00 

60 

93  00 

0  36 

1  84 

0.33 

003 

0,36 

18  00 

92.00 

16.00 

70 

170  80 

0  46 

3  68 

0-43 

0.03 

0,48 

24.00 

144.00 

34-00 

ao 

387  4S 

0  68 

4  34 

0-36 

0.03 

0.63 

34  00 

312.00 

31   00 

90 

437  OS 

0.78 

5.88 

0  73 

0.03 

0.74 

39.00 

394.00 

37.00 

100 

693,34 

0  00 

6  00 

0.93 

0.03 

1,06 

45.00 

400.00 

53.00 

Constructed  DFA  Size 


mkchioc 

length 

node  no 

edge   no 

node  weight 

edge  weight 

length 

node  no 

edge   no 

node  weight 

edge  weight 

TNFA 

10 

110 

363 

3630 

30 

31 

430 

1333 

36460 

TNFA   (imp     itKt«) 

10 

110 

11 

110 

30 

31 

420 

31 

420 

opl     TNFA 

10 

100 

10 

100 

30 

20 

400 

30 

400 

aoopl.   CNFA 

10 

110 

11 

110 

30 

31 

430 

31 

430 

CNFA 

10 

20 

3 

20 

30 

2 

40 

2 

40 

M  V  N  FA 

10 

110 

11 

110 

20 

21 

430 

31 

430 

TNFA 

30 

930 

3883 

86490 

40 

41 

1640 

5043 

201720 

TNFA    (imp     •t»le) 

30 

930 

51 

930 

40 

41 

1640 

41 

1640 

opl     TNFA 

30 

900 

30 

900 

40 

40 

1600 

40 

1600 

unopl     CNFA 

30 

930 

31 

930 

40 

41 

1640 

41 

1640 

CNFA 

30 

60 

2 

60 

40 

2 

60 

2 

80 

MYNFA 

30 

930 

31 

930 

40 

41 

1640 

41 

1640 

TNFA 

50 

3550 

7603 

390150 

60 

61 

3660 

11163 

669780 

TNFA   (imp     •l»te) 

50 

2550 

51 

3550 

60 

61 

3660 

61 

3660 

opl     TNFA 

50 

3500 

50 

3500 

60 

60 

3600 

60 

3600 

anopl     CNFA 

50 

3550 

51 

3550 

60 

61 

3660 

61 

3660 

CNFA 

50 

100 

3 

100 

60 

2 

130 

2 

120 

MYNFA 

50 

3550 

51 

3550 

60 

61 

3660 

61 

3660 

TNFA 

70 

4970 

15133 

1058610 

60 

81 

6480 

19683 

1574640 

TNFA    (imp     BlAle) 

70 

4970 

71 

4970 

60 

81 

6480 

61 

6460 

opl     TNFA 

70 

4900 

70 

4900 

60 

80 

6400 

60 

6400 

uoopl.   CNFA 

70 

4970 

71 

4970 

80 

61 

6480 

81 

6480 

CNFA 

70 

140 

3 

140 

80 

2 

160 

3 

160 

MYNFA 

70 

4970 

71 

4970 

80 

61 

6460 

81 

6480 

TNFA 

90 

8190 

24843 

3235870 

100 

101 

10100 

30603 

3060300 

TNFA    (imp     alkie) 

90 

8190 

91 

6190 

100 

101 

10100 

101 

10100 

opl     TNFA 

90 

8100 

90 

8100 

100 

100 

10000 

100 

10000 

QDopl     CNFA 

90 

6190 

91 

6190 

100 

101 

10100 

101 

10100 

CNFA 

90 

180 

3 

180 

100 

3 

300 

3 

200 

MYNFA 

90 

8190 

91 

6190 

100 

101 

10100 

101 

10100 

24 


(a|6)*a{a|6)" 


Construction  Time 


length 

TNFA 

TNFA 
(imp-   itate) 

opt.    TNFA 

unopt-   CNFA 

CNFA 

MYNFA 

TNFA   (imp.   •tale)   vb 
CNFA 

opt      TNFA    vs 
CNFA 

MYNFA    V9 
CNFA 

1 

0.00 

0.00 

0.00 

0.00 

0,00 

0  00 

0,00 

1,00 

1,00 

2 

0-02 

0.02 

0.02 

000 

0,02 

0  00 

1.00 

1.00 

0.00 

3 

0.04 

0,02 

0-02 

0.00 

0-02 

0  00 

1,00 

1.00 

0.00 

4 

0.02 

0  02 

0.02 

0,00 

0.02 

0-02 

1.00 

1.00 

1.00 

5 

0.06 

0.02 

0  04 

0.02 

0.02 

0-04 

1.00 

2.00 

2.00 

6 

0.16 

0-04 

0.06 

0-06 

0  06 

0  08 

0,67 

1.00 

1-33 

7 

0-30 

0-14 

0,12 

0,14 

0,14 

0,14 

1,00 

0.86 

1.00 

6 

O.TO 

0.26 

0.22 

0  30 

0.26 

0,36 

0  93 

0  79 

129 

9 

1.58 

0.54 

0  50 

0  76 

0.56 

0,94 

0,96 

0-89 

1-68 

10 

3-54 

1.14 

1.06 

2.26 

1-22 

3.34 

0.93 

0-87 

2.74 

Constructed  DFA  Size 


machine 

length 

node  no 

edge  no 

node  weight           e 

dge  weight 

length 

node  no 

edge  no 

node  weight 

edge  weight 

TNFA 

5 

10 

39 

83 

2 

9 

18 

89 

183 

TNFA   (imp.   .tate) 

5 

10 

9 

19 

2 

9 

18 

21 

43 

opt.    TNFA 

4 

8 

8 

16 

8 

16 

20 

40 

unopt.   CNFA 

5 

10 

9 

19 

9 

18 

21 

43 

CNFA 

5 

10 

9 

19 

9 

18 

21 

43 

MYNFA 

5 

10 

9 

19 

9 

IS 

21 

43 

TNFA 

17 

34 

205 

415 

33 

66 

469 

943 

TNFA    (imp.   Btate) 

17 

34 

49 

99 

33 

66 

113 

227 

opt.    TNFA 

16 

32 

48 

96 

32 

64 

112 

224 

unopt.   CNFA 

17 

34 

49 

99 

33 

66 

113 

227 

CNFA 

17 

34 

49 

99 

33 

66 

113 

227 

MYNFA 

17 

34 

49 

99 

33 

66 

113 

227 

TNFA 

65 

130 

1061 

2127 

129 

258 

2373 

4751 

TNFA   (imp,   .tate) 

65 

130 

257 

515 

6 

129 

258 

577 

1155 

opt.   TNFA 

64 

128 

356 

512 

6 

138 

256 

576 

1152 

unopl.   CNFA 

5 

65 

130 

257 

515 

6 

129 

258 

577 

1155 

CNFA 

5 

65 

130 

257 

515 

6 

129 

256 

577 

1155 

MYNFA 

5 

65 

130 

257 

515 

6 

129 

258 

577 

1155 

TNFA 

7 

257 

514 

5253 

10511 

S 

513 

1026 

11525 

23055 

TNFA    (imp     itate) 

7 

257 

514 

1281 

2563 

8 

513 

1026 

2817 

5635 

opt.    TNFA 

7 

256 

512 

1280 

2560 

8 

512 

1024 

2816 

5632 

unopt     CNFA 

7 

257 

514 

1381 

2563 

8 

513 

1036 

2817 

5635 

CNFA 

7 

257 

514 

1281 

2563 

8 

513 

1026 

2817 

5635 

MYNFA 

7 

257 

514 

1381 

3563 

8 

513 

1036 

2817 

5635 

TNFA 

9 

1025 

3050 

25093 

50191 

10 

2049 

4098 

54277 

108559 

TNFA   (imp     ttatc) 

9 

1025 

3050 

6145 

12391 

10 

2049 

4098 

13313 

26627 

opt-    TNFA 

9 

1024 

2048 

6144 

13288 

10 

3048 

4096 

13312 

26624 

unopt     CNFA 

9 

1025 

2050 

6146 

12291 

10 

2049 

409B 

13313 

26627 

CNFA 

9 

1025 

2050 

6145 

12291 

10 

2049 

4098 

13313 

36627 

MYNFA 

9 

1025 

2050 

6145 

12291 

10 

2049 

4098 

13313 

26627 
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